Overview

Dataset statistics

Number of variables21
Number of observations5507751
Missing cells39301223
Missing cells (%)34.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory882.4 MiB
Average record size in memory168.0 B

Variable types

Numeric4
DateTime2
Text3
Categorical12

Alerts

PERSON_TYPE is highly imbalanced (85.9%)Imbalance
PERSON_INJURY is highly imbalanced (65.7%)Imbalance
EJECTION is highly imbalanced (93.3%)Imbalance
EMOTIONAL_STATUS is highly imbalanced (74.4%)Imbalance
BODILY_INJURY is highly imbalanced (68.6%)Imbalance
POSITION_IN_VEHICLE is highly imbalanced (52.8%)Imbalance
SAFETY_EQUIPMENT is highly imbalanced (60.2%)Imbalance
COMPLAINT is highly imbalanced (75.2%)Imbalance
VEHICLE_ID has 224838 (4.1%) missing valuesMissing
PERSON_AGE has 599276 (10.9%) missing valuesMissing
EJECTION has 2679259 (48.6%) missing valuesMissing
EMOTIONAL_STATUS has 2592938 (47.1%) missing valuesMissing
BODILY_INJURY has 2592895 (47.1%) missing valuesMissing
POSITION_IN_VEHICLE has 2678865 (48.6%) missing valuesMissing
SAFETY_EQUIPMENT has 2861973 (52.0%) missing valuesMissing
PED_LOCATION has 5416437 (98.3%) missing valuesMissing
PED_ACTION has 5416538 (98.3%) missing valuesMissing
COMPLAINT has 2592888 (47.1%) missing valuesMissing
PED_ROLE has 194889 (3.5%) missing valuesMissing
CONTRIBUTING_FACTOR_1 has 5417789 (98.4%) missing valuesMissing
CONTRIBUTING_FACTOR_2 has 5417906 (98.4%) missing valuesMissing
PERSON_SEX has 614713 (11.2%) missing valuesMissing
PERSON_AGE is highly skewed (γ1 = 72.22093394)Skewed
UNIQUE_ID has unique valuesUnique
PERSON_AGE has 547401 (9.9%) zerosZeros

Reproduction

Analysis started2024-10-29 14:05:50.687965
Analysis finished2024-10-29 14:08:29.697496
Duration2 minutes and 39.01 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

UNIQUE_ID
Real number (ℝ)

UNIQUE 

Distinct5507751
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9148413.7
Minimum10922
Maximum13187099
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size42.0 MiB
2024-10-29T15:08:29.757140image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum10922
5-th percentile5811028.5
Q17019331.5
median9412847
Q311508674
95-th percentile12833888
Maximum13187099
Range13176177
Interquartile range (IQR)4489342

Descriptive statistics

Standard deviation2665919.8
Coefficient of variation (CV)0.29140788
Kurtosis-0.13772565
Mean9148413.7
Median Absolute Deviation (MAD)2293893
Skewness-0.47583741
Sum5.0387185 × 1013
Variance7.1071284 × 1012
MonotonicityNot monotonic
2024-10-29T15:08:29.834145image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10249006 1
 
< 0.1%
11357975 1
 
< 0.1%
11357395 1
 
< 0.1%
11357768 1
 
< 0.1%
11356878 1
 
< 0.1%
11359067 1
 
< 0.1%
11358163 1
 
< 0.1%
11357862 1
 
< 0.1%
11357054 1
 
< 0.1%
11356155 1
 
< 0.1%
Other values (5507741) 5507741
> 99.9%
ValueCountFrequency (%)
10922 1
< 0.1%
79660 1
< 0.1%
79953 1
< 0.1%
79954 1
< 0.1%
81004 1
< 0.1%
81073 1
< 0.1%
81886 1
< 0.1%
82012 1
< 0.1%
82146 1
< 0.1%
82227 1
< 0.1%
ValueCountFrequency (%)
13187099 1
< 0.1%
13187098 1
< 0.1%
13187097 1
< 0.1%
13187096 1
< 0.1%
13187095 1
< 0.1%
13187094 1
< 0.1%
13187093 1
< 0.1%
13187092 1
< 0.1%
13187091 1
< 0.1%
13187090 1
< 0.1%

COLLISION_ID
Real number (ℝ)

Distinct1499799
Distinct (%)27.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3971172.3
Minimum37
Maximum4766163
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size42.0 MiB
2024-10-29T15:08:29.912951image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum37
5-th percentile3425161.5
Q13687055
median4021234
Q34371898
95-th percentile4687011.5
Maximum4766163
Range4766126
Interquartile range (IQR)684843

Descriptive statistics

Standard deviation655148.48
Coefficient of variation (CV)0.16497609
Kurtosis17.403828
Mean3971172.3
Median Absolute Deviation (MAD)341649
Skewness-3.3930529
Sum2.1872228 × 1013
Variance4.2921953 × 1011
MonotonicityNot monotonic
2024-10-29T15:08:29.987110image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3963775 77
 
< 0.1%
4691158 71
 
< 0.1%
3591272 66
 
< 0.1%
3539636 65
 
< 0.1%
3504309 64
 
< 0.1%
3571716 62
 
< 0.1%
3904409 61
 
< 0.1%
3691734 61
 
< 0.1%
3449201 60
 
< 0.1%
4143411 60
 
< 0.1%
Other values (1499789) 5507104
> 99.9%
ValueCountFrequency (%)
37 1
< 0.1%
39 1
< 0.1%
40 1
< 0.1%
44 1
< 0.1%
52 1
< 0.1%
55 2
< 0.1%
78 1
< 0.1%
79 2
< 0.1%
104 1
< 0.1%
107 1
< 0.1%
ValueCountFrequency (%)
4766163 3
 
< 0.1%
4766160 2
 
< 0.1%
4766157 6
< 0.1%
4766156 2
 
< 0.1%
4766155 4
 
< 0.1%
4766154 3
 
< 0.1%
4766152 4
 
< 0.1%
4766151 3
 
< 0.1%
4766150 3
 
< 0.1%
4766148 10
< 0.1%
Distinct4497
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size42.0 MiB
Minimum2012-07-01 00:00:00
Maximum2024-10-22 00:00:00
2024-10-29T15:08:30.055911image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:08:30.130465image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct1440
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size42.0 MiB
Minimum2024-10-29 00:00:00
Maximum2024-10-29 23:59:00
2024-10-29T15:08:30.205007image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:08:30.275423image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct5312928
Distinct (%)96.5%
Missing19
Missing (%)< 0.1%
Memory size42.0 MiB
2024-10-29T15:08:32.576832image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length36
Median length36
Mean length30.558888
Min length1

Characters and Unicode

Total characters168310166
Distinct characters17
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5312896 ?
Unique (%)96.5%

Sample

1st row31aa2bc0-f545-444f-8cdb-f1cb5cf00b89
2nd row4629e500-a73e-48dc-b8fb-53124d124b80
3rd rowae48c136-1383-45db-83f4-2a5eecfb7cff
4th row2782525
5th rowe038e18f-40fb-4471-99cf-345eae36e064
ValueCountFrequency (%)
1 142787
 
2.6%
2 31734
 
0.6%
3 11543
 
0.2%
4 4672
 
0.1%
5 2005
 
< 0.1%
6 923
 
< 0.1%
7 448
 
< 0.1%
8 235
 
< 0.1%
9 149
 
< 0.1%
10 91
 
< 0.1%
Other values (5312918) 5313145
96.5%
2024-10-29T15:08:34.733833image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 18092964
 
10.7%
4 13474881
 
8.0%
9 10087515
 
6.0%
8 10076947
 
6.0%
a 9611356
 
5.7%
b 9611229
 
5.7%
1 9367153
 
5.6%
2 9234786
 
5.5%
3 9021796
 
5.4%
7 8954703
 
5.3%
Other values (7) 60776836
36.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 97068925
57.7%
Lowercase Letter 53148277
31.6%
Dash Punctuation 18092964
 
10.7%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
4 13474881
13.9%
9 10087515
10.4%
8 10076947
10.4%
1 9367153
9.7%
2 9234786
9.5%
3 9021796
9.3%
7 8954703
9.2%
6 8954668
9.2%
0 8948906
9.2%
5 8947570
9.2%
Lowercase Letter
ValueCountFrequency (%)
a 9611356
18.1%
b 9611229
18.1%
d 8488286
16.0%
f 8480361
16.0%
e 8478542
16.0%
c 8478503
16.0%
Dash Punctuation
ValueCountFrequency (%)
- 18092964
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 115161889
68.4%
Latin 53148277
31.6%

Most frequent character per script

Common
ValueCountFrequency (%)
- 18092964
15.7%
4 13474881
11.7%
9 10087515
8.8%
8 10076947
8.8%
1 9367153
8.1%
2 9234786
8.0%
3 9021796
7.8%
7 8954703
7.8%
6 8954668
7.8%
0 8948906
7.8%
Latin
ValueCountFrequency (%)
a 9611356
18.1%
b 9611229
18.1%
d 8488286
16.0%
f 8480361
16.0%
e 8478542
16.0%
c 8478503
16.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 168310166
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
- 18092964
 
10.7%
4 13474881
 
8.0%
9 10087515
 
6.0%
8 10076947
 
6.0%
a 9611356
 
5.7%
b 9611229
 
5.7%
1 9367153
 
5.6%
2 9234786
 
5.5%
3 9021796
 
5.4%
7 8954703
 
5.3%
Other values (7) 60776836
36.1%

PERSON_TYPE
Categorical

IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size42.0 MiB
Occupant
5294767 
Pedestrian
 
131361
Bicyclist
 
71227
Other Motorized
 
10396

Length

Max length15
Median length8
Mean length8.0738452
Min length8

Characters and Unicode

Total characters44468729
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOccupant
2nd rowOccupant
3rd rowOccupant
4th rowOccupant
5th rowOccupant

Common Values

ValueCountFrequency (%)
Occupant 5294767
96.1%
Pedestrian 131361
 
2.4%
Bicyclist 71227
 
1.3%
Other Motorized 10396
 
0.2%

Length

2024-10-29T15:08:34.812025image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-10-29T15:08:34.868136image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
occupant 5294767
96.0%
pedestrian 131361
 
2.4%
bicyclist 71227
 
1.3%
other 10396
 
0.2%
motorized 10396
 
0.2%

Most occurring characters

ValueCountFrequency (%)
c 10731988
24.1%
t 5518147
12.4%
a 5426128
12.2%
n 5426128
12.2%
O 5305163
11.9%
u 5294767
11.9%
p 5294767
11.9%
i 284211
 
0.6%
e 283514
 
0.6%
s 202588
 
0.5%
Other values (11) 701328
 
1.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 38940186
87.6%
Uppercase Letter 5518147
 
12.4%
Space Separator 10396
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
c 10731988
27.6%
t 5518147
14.2%
a 5426128
13.9%
n 5426128
13.9%
u 5294767
13.6%
p 5294767
13.6%
i 284211
 
0.7%
e 283514
 
0.7%
s 202588
 
0.5%
r 152153
 
0.4%
Other values (6) 325795
 
0.8%
Uppercase Letter
ValueCountFrequency (%)
O 5305163
96.1%
P 131361
 
2.4%
B 71227
 
1.3%
M 10396
 
0.2%
Space Separator
ValueCountFrequency (%)
10396
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 44458333
> 99.9%
Common 10396
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
c 10731988
24.1%
t 5518147
12.4%
a 5426128
12.2%
n 5426128
12.2%
O 5305163
11.9%
u 5294767
11.9%
p 5294767
11.9%
i 284211
 
0.6%
e 283514
 
0.6%
s 202588
 
0.5%
Other values (10) 690932
 
1.6%
Common
ValueCountFrequency (%)
10396
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 44468729
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
c 10731988
24.1%
t 5518147
12.4%
a 5426128
12.2%
n 5426128
12.2%
O 5305163
11.9%
u 5294767
11.9%
p 5294767
11.9%
i 284211
 
0.6%
e 283514
 
0.6%
s 202588
 
0.5%
Other values (11) 701328
 
1.6%

PERSON_INJURY
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size42.0 MiB
Unspecified
4829123 
Injured
675365 
Killed
 
3263

Length

Max length11
Median length11
Mean length10.506554
Min length6

Characters and Unicode

Total characters57867486
Distinct characters15
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified

Common Values

ValueCountFrequency (%)
Unspecified 4829123
87.7%
Injured 675365
 
12.3%
Killed 3263
 
0.1%

Length

2024-10-29T15:08:34.927133image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-10-29T15:08:34.978263image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
unspecified 4829123
87.7%
injured 675365
 
12.3%
killed 3263
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e 10336874
17.9%
i 9661509
16.7%
d 5507751
9.5%
n 5504488
9.5%
U 4829123
8.3%
s 4829123
8.3%
p 4829123
8.3%
c 4829123
8.3%
f 4829123
8.3%
I 675365
 
1.2%
Other values (5) 2035884
 
3.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 52359735
90.5%
Uppercase Letter 5507751
 
9.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 10336874
19.7%
i 9661509
18.5%
d 5507751
10.5%
n 5504488
10.5%
s 4829123
9.2%
p 4829123
9.2%
c 4829123
9.2%
f 4829123
9.2%
j 675365
 
1.3%
u 675365
 
1.3%
Other values (2) 681891
 
1.3%
Uppercase Letter
ValueCountFrequency (%)
U 4829123
87.7%
I 675365
 
12.3%
K 3263
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Latin 57867486
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 10336874
17.9%
i 9661509
16.7%
d 5507751
9.5%
n 5504488
9.5%
U 4829123
8.3%
s 4829123
8.3%
p 4829123
8.3%
c 4829123
8.3%
f 4829123
8.3%
I 675365
 
1.2%
Other values (5) 2035884
 
3.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 57867486
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 10336874
17.9%
i 9661509
16.7%
d 5507751
9.5%
n 5504488
9.5%
U 4829123
8.3%
s 4829123
8.3%
p 4829123
8.3%
c 4829123
8.3%
f 4829123
8.3%
I 675365
 
1.2%
Other values (5) 2035884
 
3.5%

VEHICLE_ID
Real number (ℝ)

MISSING 

Distinct2550256
Distinct (%)48.3%
Missing224838
Missing (%)4.1%
Infinite0
Infinite (%)0.0%
Mean18582262
Minimum123423
Maximum20771082
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size42.0 MiB
2024-10-29T15:08:35.050139image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum123423
5-th percentile17037408
Q117563631
median18729960
Q319800448
95-th percentile20567251
Maximum20771082
Range20647659
Interquartile range (IQR)2236817

Descriptive statistics

Standard deviation1580138.1
Coefficient of variation (CV)0.085034754
Kurtosis7.5897347
Mean18582262
Median Absolute Deviation (MAD)1125663
Skewness-1.8170387
Sum9.8168472 × 1013
Variance2.4968363 × 1012
MonotonicityNot monotonic
2024-10-29T15:08:35.121136image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18590796 71
 
< 0.1%
17075216 63
 
< 0.1%
17334601 63
 
< 0.1%
17364088 60
 
< 0.1%
18954743 58
 
< 0.1%
18968693 58
 
< 0.1%
17483298 58
 
< 0.1%
17826063 58
 
< 0.1%
17521817 57
 
< 0.1%
19106096 57
 
< 0.1%
Other values (2550246) 5282310
95.9%
(Missing) 224838
 
4.1%
ValueCountFrequency (%)
123423 1
 
< 0.1%
602947 2
< 0.1%
611686 1
 
< 0.1%
620307 1
 
< 0.1%
621082 2
< 0.1%
622848 3
< 0.1%
625915 1
 
< 0.1%
628019 1
 
< 0.1%
629935 1
 
< 0.1%
630993 3
< 0.1%
ValueCountFrequency (%)
20771082 1
< 0.1%
20771081 1
< 0.1%
20771079 2
< 0.1%
20771078 2
< 0.1%
20771077 2
< 0.1%
20771072 1
< 0.1%
20771071 1
< 0.1%
20771070 1
< 0.1%
20771069 1
< 0.1%
20771068 2
< 0.1%

PERSON_AGE
Real number (ℝ)

MISSING  SKEWED  ZEROS 

Distinct896
Distinct (%)< 0.1%
Missing599276
Missing (%)10.9%
Infinite0
Infinite (%)0.0%
Mean37.336707
Minimum-999
Maximum9999
Zeros547401
Zeros (%)9.9%
Negative1205
Negative (%)< 0.1%
Memory size42.0 MiB
2024-10-29T15:08:35.190289image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum-999
5-th percentile0
Q124
median36
Q350
95-th percentile68
Maximum9999
Range10998
Interquartile range (IQR)26

Descriptive statistics

Standard deviation113.26323
Coefficient of variation (CV)3.0335623
Kurtosis5856.8109
Mean37.336707
Median Absolute Deviation (MAD)13
Skewness72.220934
Sum1.832663 × 108
Variance12828.559
MonotonicityNot monotonic
2024-10-29T15:08:35.260933image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 547401
 
9.9%
30 112254
 
2.0%
29 111764
 
2.0%
28 111399
 
2.0%
27 110861
 
2.0%
31 108187
 
2.0%
26 107508
 
2.0%
32 106776
 
1.9%
33 103839
 
1.9%
25 103456
 
1.9%
Other values (886) 3385030
61.5%
(Missing) 599276
 
10.9%
ValueCountFrequency (%)
-999 8
< 0.1%
-997 2
 
< 0.1%
-996 1
 
< 0.1%
-992 2
 
< 0.1%
-991 1
 
< 0.1%
-990 3
 
< 0.1%
-989 1
 
< 0.1%
-987 1
 
< 0.1%
-982 3
 
< 0.1%
-980 1
 
< 0.1%
ValueCountFrequency (%)
9999 417
< 0.1%
9262 1
 
< 0.1%
9232 1
 
< 0.1%
9211 1
 
< 0.1%
9191 1
 
< 0.1%
9151 1
 
< 0.1%
9131 1
 
< 0.1%
9122 1
 
< 0.1%
8051 1
 
< 0.1%
8041 1
 
< 0.1%

EJECTION
Categorical

IMBALANCE  MISSING 

Distinct6
Distinct (%)< 0.1%
Missing2679259
Missing (%)48.6%
Memory size42.0 MiB
Not Ejected
2772638 
Ejected
 
26536
Does Not Apply
 
15891
Partially Ejected
 
11563
Trapped
 
1323

Length

Max length17
Median length11
Mean length11.00122
Min length7

Characters and Unicode

Total characters31116863
Distinct characters24
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot Ejected
2nd rowNot Ejected
3rd rowNot Ejected
4th rowNot Ejected
5th rowNot Ejected

Common Values

ValueCountFrequency (%)
Not Ejected 2772638
50.3%
Ejected 26536
 
0.5%
Does Not Apply 15891
 
0.3%
Partially Ejected 11563
 
0.2%
Trapped 1323
 
< 0.1%
Unknown 541
 
< 0.1%
(Missing) 2679259
48.6%

Length

2024-10-29T15:08:35.324141image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-10-29T15:08:35.379193image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
ejected 2810737
49.8%
not 2788529
49.4%
does 15891
 
0.3%
apply 15891
 
0.3%
partially 11563
 
0.2%
trapped 1323
 
< 0.1%
unknown 541
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e 5638688
18.1%
t 5610829
18.0%
2815983
9.0%
d 2812060
9.0%
E 2810737
9.0%
j 2810737
9.0%
c 2810737
9.0%
o 2804961
9.0%
N 2788529
9.0%
l 39017
 
0.1%
Other values (14) 174585
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 22656405
72.8%
Uppercase Letter 5644475
 
18.1%
Space Separator 2815983
 
9.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 5638688
24.9%
t 5610829
24.8%
d 2812060
12.4%
j 2810737
12.4%
c 2810737
12.4%
o 2804961
12.4%
l 39017
 
0.2%
p 34428
 
0.2%
y 27454
 
0.1%
a 24449
 
0.1%
Other values (6) 43045
 
0.2%
Uppercase Letter
ValueCountFrequency (%)
E 2810737
49.8%
N 2788529
49.4%
A 15891
 
0.3%
D 15891
 
0.3%
P 11563
 
0.2%
T 1323
 
< 0.1%
U 541
 
< 0.1%
Space Separator
ValueCountFrequency (%)
2815983
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 28300880
91.0%
Common 2815983
 
9.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 5638688
19.9%
t 5610829
19.8%
d 2812060
9.9%
E 2810737
9.9%
j 2810737
9.9%
c 2810737
9.9%
o 2804961
9.9%
N 2788529
9.9%
l 39017
 
0.1%
p 34428
 
0.1%
Other values (13) 140157
 
0.5%
Common
ValueCountFrequency (%)
2815983
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 31116863
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 5638688
18.1%
t 5610829
18.0%
2815983
9.0%
d 2812060
9.0%
E 2810737
9.0%
j 2810737
9.0%
c 2810737
9.0%
o 2804961
9.0%
N 2788529
9.0%
l 39017
 
0.1%
Other values (14) 174585
 
0.6%

EMOTIONAL_STATUS
Categorical

IMBALANCE  MISSING 

Distinct8
Distinct (%)< 0.1%
Missing2592938
Missing (%)47.1%
Memory size42.0 MiB
Does Not Apply
2399462 
Conscious
476626 
Unknown
 
14887
Shock
 
14452
Semiconscious
 
2840
Other values (3)
 
6546

Length

Max length14
Median length14
Mean length13.095704
Min length5

Characters and Unicode

Total characters38171528
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDoes Not Apply
2nd rowDoes Not Apply
3rd rowConscious
4th rowConscious
5th rowDoes Not Apply

Common Values

ValueCountFrequency (%)
Does Not Apply 2399462
43.6%
Conscious 476626
 
8.7%
Unknown 14887
 
0.3%
Shock 14452
 
0.3%
Semiconscious 2840
 
0.1%
Unconscious 2705
 
< 0.1%
Apparent Death 1968
 
< 0.1%
Incoherent 1873
 
< 0.1%
(Missing) 2592938
47.1%

Length

2024-10-29T15:08:35.441232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-10-29T15:08:35.499831image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
does 2399462
31.1%
not 2399462
31.1%
apply 2399462
31.1%
conscious 476626
 
6.2%
unknown 14887
 
0.2%
shock 14452
 
0.2%
semiconscious 2840
 
< 0.1%
unconscious 2705
 
< 0.1%
apparent 1968
 
< 0.1%
death 1968
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
o 5794478
15.2%
p 4802860
12.6%
4800892
12.6%
s 3363804
8.8%
e 2409984
6.3%
t 2405271
6.3%
D 2401430
6.3%
A 2401430
6.3%
N 2399462
6.3%
l 2399462
6.3%
Other values (15) 4992455
13.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 25654931
67.2%
Uppercase Letter 7715705
 
20.2%
Space Separator 4800892
 
12.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 5794478
22.6%
p 4802860
18.7%
s 3363804
13.1%
e 2409984
9.4%
t 2405271
9.4%
l 2399462
9.4%
y 2399462
9.4%
n 535251
 
2.1%
c 504041
 
2.0%
i 485011
 
1.9%
Other values (7) 555307
 
2.2%
Uppercase Letter
ValueCountFrequency (%)
D 2401430
31.1%
A 2401430
31.1%
N 2399462
31.1%
C 476626
 
6.2%
U 17592
 
0.2%
S 17292
 
0.2%
I 1873
 
< 0.1%
Space Separator
ValueCountFrequency (%)
4800892
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 33370636
87.4%
Common 4800892
 
12.6%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 5794478
17.4%
p 4802860
14.4%
s 3363804
10.1%
e 2409984
7.2%
t 2405271
7.2%
D 2401430
7.2%
A 2401430
7.2%
N 2399462
7.2%
l 2399462
7.2%
y 2399462
7.2%
Other values (14) 2592993
7.8%
Common
ValueCountFrequency (%)
4800892
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 38171528
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 5794478
15.2%
p 4802860
12.6%
4800892
12.6%
s 3363804
8.8%
e 2409984
6.3%
t 2405271
6.3%
D 2401430
6.3%
A 2401430
6.3%
N 2399462
6.3%
l 2399462
6.3%
Other values (15) 4992455
13.1%

BODILY_INJURY
Categorical

IMBALANCE  MISSING 

Distinct14
Distinct (%)< 0.1%
Missing2592895
Missing (%)47.1%
Memory size42.0 MiB
Does Not Apply
2430275 
Back
 
80790
Neck
 
77553
Knee-Lower Leg Foot
 
74430
Head
 
67113
Other values (9)
 
184695

Length

Max length20
Median length14
Mean length13.29725
Min length3

Characters and Unicode

Total characters38759570
Distinct characters36
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDoes Not Apply
2nd rowDoes Not Apply
3rd rowBack
4th rowShoulder - Upper Arm
5th rowDoes Not Apply

Common Values

ValueCountFrequency (%)
Does Not Apply 2430275
44.1%
Back 80790
 
1.5%
Neck 77553
 
1.4%
Knee-Lower Leg Foot 74430
 
1.4%
Head 67113
 
1.2%
Entire Body 39667
 
0.7%
Elbow-Lower-Arm-Hand 33361
 
0.6%
Shoulder - Upper Arm 33301
 
0.6%
Unknown 21110
 
0.4%
Chest 17646
 
0.3%
Other values (4) 39610
 
0.7%
(Missing) 2592895
47.1%

Length

2024-10-29T15:08:35.572794image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
does 2430275
30.0%
apply 2430275
30.0%
not 2430275
30.0%
leg 91599
 
1.1%
back 80790
 
1.0%
neck 77553
 
1.0%
knee-lower 74430
 
0.9%
foot 74430
 
0.9%
head 67113
 
0.8%
41810
 
0.5%
Other values (13) 299473
 
3.7%

Most occurring characters

ValueCountFrequency (%)
o 5253149
13.6%
5183167
13.4%
p 4978659
12.8%
e 3095225
8.0%
t 2562018
6.6%
N 2507828
6.5%
A 2505446
6.5%
l 2505446
6.5%
y 2470862
6.4%
s 2456430
6.3%
Other values (26) 5241340
13.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 25095016
64.7%
Uppercase Letter 8247895
 
21.3%
Space Separator 5183167
 
13.4%
Dash Punctuation 233492
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 5253149
20.9%
p 4978659
19.8%
e 3095225
12.3%
t 2562018
10.2%
l 2505446
10.0%
y 2470862
9.8%
s 2456430
9.8%
r 297891
 
1.2%
n 219297
 
0.9%
a 194276
 
0.8%
Other values (11) 1061763
 
4.2%
Uppercase Letter
ValueCountFrequency (%)
N 2507828
30.4%
A 2505446
30.4%
D 2430275
29.5%
L 199390
 
2.4%
B 120457
 
1.5%
H 117643
 
1.4%
F 87442
 
1.1%
K 74430
 
0.9%
E 73948
 
0.9%
U 71580
 
0.9%
Other values (3) 59456
 
0.7%
Space Separator
ValueCountFrequency (%)
5183167
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 233492
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 33342911
86.0%
Common 5416659
 
14.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 5253149
15.8%
p 4978659
14.9%
e 3095225
9.3%
t 2562018
7.7%
N 2507828
7.5%
A 2505446
7.5%
l 2505446
7.5%
y 2470862
7.4%
s 2456430
7.4%
D 2430275
7.3%
Other values (24) 2577573
7.7%
Common
ValueCountFrequency (%)
5183167
95.7%
- 233492
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 38759570
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 5253149
13.6%
5183167
13.4%
p 4978659
12.8%
e 3095225
8.0%
t 2562018
6.6%
N 2507828
6.5%
A 2505446
6.5%
l 2505446
6.5%
y 2470862
6.4%
s 2456430
6.3%
Other values (26) 5241340
13.5%

POSITION_IN_VEHICLE
Categorical

IMBALANCE  MISSING 

Distinct11
Distinct (%)< 0.1%
Missing2678865
Missing (%)48.6%
Memory size42.0 MiB
Driver
1975733 
Front passenger, if two or more persons, including the driver, are in the front seat
346747 
Right rear passenger or motorcycle sidecar passenger
 
141253
Left rear passenger, or rear passenger on a bicycle, motorcycle, snowmobile
 
132441
Any person in the rear of a station wagon, pick-up truck, all passengers on a bus, etc
 
76572
Other values (6)
 
156140

Length

Max length86
Median length6
Mean length24.567415
Min length6

Characters and Unicode

Total characters69498415
Distinct characters39
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFront passenger, if two or more persons, including the driver, are in the front seat
2nd rowRight rear passenger or motorcycle sidecar passenger
3rd rowDriver
4th rowDriver
5th rowDriver

Common Values

ValueCountFrequency (%)
Driver 1975733
35.9%
Front passenger, if two or more persons, including the driver, are in the front seat 346747
 
6.3%
Right rear passenger or motorcycle sidecar passenger 141253
 
2.6%
Left rear passenger, or rear passenger on a bicycle, motorcycle, snowmobile 132441
 
2.4%
Any person in the rear of a station wagon, pick-up truck, all passengers on a bus, etc 76572
 
1.4%
Unknown 67074
 
1.2%
Middle rear seat, or passenger lying across a seat 42750
 
0.8%
Middle front seat, or passenger lying across a seat 34595
 
0.6%
Riding/Hanging on Outside 7543
 
0.1%
Does Not Apply 3246
 
0.1%
(Missing) 2678865
48.6%

Length

2024-10-29T15:08:35.644156image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
driver 2322480
19.8%
passenger 971480
 
8.3%
the 770066
 
6.6%
front 728089
 
6.2%
or 697786
 
5.9%
rear 525457
 
4.5%
seat 501437
 
4.3%
in 423319
 
3.6%
a 362930
 
3.1%
if 347679
 
3.0%
Other values (38) 4077495
34.8%

Most occurring characters

ValueCountFrequency (%)
r 9858274
14.2%
8899332
12.8%
e 8314925
12.0%
i 4672366
 
6.7%
n 4200842
 
6.0%
s 4042088
 
5.8%
o 3957733
 
5.7%
a 3244208
 
4.7%
t 3212668
 
4.6%
v 2322480
 
3.3%
Other values (29) 16773499
24.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 55918015
80.5%
Space Separator 8899332
 
12.8%
Uppercase Letter 2850464
 
4.1%
Other Punctuation 1754032
 
2.5%
Dash Punctuation 76572
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 9858274
17.6%
e 8314925
14.9%
i 4672366
8.4%
n 4200842
7.5%
s 4042088
 
7.2%
o 3957733
 
7.1%
a 3244208
 
5.8%
t 3212668
 
5.7%
v 2322480
 
4.2%
g 1712598
 
3.1%
Other values (12) 10379833
18.6%
Uppercase Letter
ValueCountFrequency (%)
D 1978979
69.4%
F 346747
 
12.2%
R 148796
 
5.2%
L 132441
 
4.6%
A 79818
 
2.8%
M 77345
 
2.7%
U 67074
 
2.4%
H 7543
 
0.3%
O 7543
 
0.3%
N 3246
 
0.1%
Other Punctuation
ValueCountFrequency (%)
, 1744625
99.5%
/ 7543
 
0.4%
& 932
 
0.1%
; 932
 
0.1%
Space Separator
ValueCountFrequency (%)
8899332
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 76572
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 58768479
84.6%
Common 10729936
 
15.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 9858274
16.8%
e 8314925
14.1%
i 4672366
 
8.0%
n 4200842
 
7.1%
s 4042088
 
6.9%
o 3957733
 
6.7%
a 3244208
 
5.5%
t 3212668
 
5.5%
v 2322480
 
4.0%
D 1978979
 
3.4%
Other values (23) 12963916
22.1%
Common
ValueCountFrequency (%)
8899332
82.9%
, 1744625
 
16.3%
- 76572
 
0.7%
/ 7543
 
0.1%
& 932
 
< 0.1%
; 932
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 69498415
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 9858274
14.2%
8899332
12.8%
e 8314925
12.0%
i 4672366
 
6.7%
n 4200842
 
6.0%
s 4042088
 
5.8%
o 3957733
 
5.7%
a 3244208
 
4.7%
t 3212668
 
4.6%
v 2322480
 
3.3%
Other values (29) 16773499
24.1%

SAFETY_EQUIPMENT
Categorical

IMBALANCE  MISSING 

Distinct17
Distinct (%)< 0.1%
Missing2861973
Missing (%)52.0%
Memory size42.0 MiB
Lap Belt & Harness
1685514 
Unknown
445545 
Lap Belt
375064 
Child Restraint Only
 
45832
Air Bag Deployed/Lap Belt/Harness
 
19552
Other values (12)
 
74271

Length

Max length40
Median length18
Mean length14.850971
Min length1

Characters and Unicode

Total characters39292373
Distinct characters37
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLap Belt & Harness
2nd rowLap Belt
3rd rowLap Belt & Harness
4th rowLap Belt & Harness
5th rowLap Belt & Harness

Common Values

ValueCountFrequency (%)
Lap Belt & Harness 1685514
30.6%
Unknown 445545
 
8.1%
Lap Belt 375064
 
6.8%
Child Restraint Only 45832
 
0.8%
Air Bag Deployed/Lap Belt/Harness 19552
 
0.4%
Other 15433
 
0.3%
Helmet (Motorcycle Only) 14278
 
0.3%
Harness 12531
 
0.2%
Helmet Only (In-Line Skater/Bicyclist) 10647
 
0.2%
- 7185
 
0.1%
Other values (7) 14197
 
0.3%
(Missing) 2861973
52.0%

Length

2024-10-29T15:08:35.716788image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
belt 2063714
24.8%
lap 2060580
24.8%
harness 1698045
20.4%
1692699
20.3%
unknown 445545
 
5.4%
only 71394
 
0.9%
restraint 46400
 
0.6%
child 45832
 
0.6%
air 29927
 
0.4%
bag 29927
 
0.4%
Other values (12) 136871
 
1.6%

Most occurring characters

ValueCountFrequency (%)
5675156
14.4%
e 4025715
10.2%
a 3891748
9.9%
s 3496702
8.9%
n 3200962
 
8.1%
l 2287842
 
5.8%
t 2266554
 
5.8%
B 2127662
 
5.4%
p 2114295
 
5.4%
L 2097735
 
5.3%
Other values (27) 8108002
20.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 25088939
63.9%
Uppercase Letter 6703162
 
17.1%
Space Separator 5675156
 
14.4%
Other Punctuation 1745974
 
4.4%
Open Punctuation 28745
 
0.1%
Close Punctuation 28745
 
0.1%
Dash Punctuation 21652
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 4025715
16.0%
a 3891748
15.5%
s 3496702
13.9%
n 3200962
12.8%
l 2287842
9.1%
t 2266554
9.0%
p 2114295
8.4%
r 1841837
7.3%
o 504578
 
2.0%
k 460012
 
1.8%
Other values (8) 998694
 
4.0%
Uppercase Letter
ValueCountFrequency (%)
B 2127662
31.7%
L 2097735
31.3%
H 1745707
26.0%
U 445545
 
6.6%
O 90010
 
1.3%
R 46400
 
0.7%
C 46400
 
0.7%
A 29927
 
0.4%
D 29927
 
0.4%
S 15017
 
0.2%
Other values (3) 28832
 
0.4%
Other Punctuation
ValueCountFrequency (%)
& 1685514
96.5%
/ 60460
 
3.5%
Space Separator
ValueCountFrequency (%)
5675156
100.0%
Open Punctuation
ValueCountFrequency (%)
( 28745
100.0%
Close Punctuation
ValueCountFrequency (%)
) 28745
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 21652
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 31792101
80.9%
Common 7500272
 
19.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 4025715
12.7%
a 3891748
12.2%
s 3496702
11.0%
n 3200962
10.1%
l 2287842
7.2%
t 2266554
7.1%
B 2127662
6.7%
p 2114295
6.7%
L 2097735
6.6%
r 1841837
5.8%
Other values (21) 4441049
14.0%
Common
ValueCountFrequency (%)
5675156
75.7%
& 1685514
 
22.5%
/ 60460
 
0.8%
( 28745
 
0.4%
) 28745
 
0.4%
- 21652
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 39292373
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
5675156
14.4%
e 4025715
10.2%
a 3891748
9.9%
s 3496702
8.9%
n 3200962
 
8.1%
l 2287842
 
5.8%
t 2266554
 
5.8%
B 2127662
 
5.4%
p 2114295
 
5.4%
L 2097735
 
5.3%
Other values (27) 8108002
20.6%

PED_LOCATION
Categorical

MISSING 

Distinct4
Distinct (%)< 0.1%
Missing5416437
Missing (%)98.3%
Memory size42.0 MiB
Pedestrian/Bicyclist/Other Pedestrian at Intersection
55506 
Pedestrian/Bicyclist/Other Pedestrian Not at Intersection
29588 
Does Not Apply
 
3638
Unknown
 
2582

Length

Max length57
Median length53
Mean length51.441619
Min length7

Characters and Unicode

Total characters4697340
Distinct characters26
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPedestrian/Bicyclist/Other Pedestrian at Intersection
2nd rowPedestrian/Bicyclist/Other Pedestrian at Intersection
3rd rowPedestrian/Bicyclist/Other Pedestrian Not at Intersection
4th rowPedestrian/Bicyclist/Other Pedestrian at Intersection
5th rowPedestrian/Bicyclist/Other Pedestrian at Intersection

Common Values

ValueCountFrequency (%)
Pedestrian/Bicyclist/Other Pedestrian at Intersection 55506
 
1.0%
Pedestrian/Bicyclist/Other Pedestrian Not at Intersection 29588
 
0.5%
Does Not Apply 3638
 
0.1%
Unknown 2582
 
< 0.1%
(Missing) 5416437
98.3%

Length

2024-10-29T15:08:35.781768image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-10-29T15:08:35.831042image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
pedestrian/bicyclist/other 85094
22.2%
pedestrian 85094
22.2%
at 85094
22.2%
intersection 85094
22.2%
not 33226
 
8.7%
does 3638
 
0.9%
apply 3638
 
0.9%
unknown 2582
 
0.7%

Most occurring characters

ValueCountFrequency (%)
t 628884
13.4%
e 599296
12.8%
i 425470
9.1%
n 348122
 
7.4%
s 344014
 
7.3%
r 340376
 
7.2%
292146
 
6.2%
a 255282
 
5.4%
c 255282
 
5.4%
P 170188
 
3.6%
Other values (16) 1038280
22.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3766452
80.2%
Uppercase Letter 468554
 
10.0%
Space Separator 292146
 
6.2%
Other Punctuation 170188
 
3.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 628884
16.7%
e 599296
15.9%
i 425470
11.3%
n 348122
9.2%
s 344014
9.1%
r 340376
9.0%
a 255282
6.8%
c 255282
6.8%
d 170188
 
4.5%
o 124540
 
3.3%
Other values (6) 274998
7.3%
Uppercase Letter
ValueCountFrequency (%)
P 170188
36.3%
O 85094
18.2%
I 85094
18.2%
B 85094
18.2%
N 33226
 
7.1%
D 3638
 
0.8%
A 3638
 
0.8%
U 2582
 
0.6%
Space Separator
ValueCountFrequency (%)
292146
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 170188
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 4235006
90.2%
Common 462334
 
9.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 628884
14.8%
e 599296
14.2%
i 425470
10.0%
n 348122
8.2%
s 344014
8.1%
r 340376
8.0%
a 255282
 
6.0%
c 255282
 
6.0%
P 170188
 
4.0%
d 170188
 
4.0%
Other values (14) 697904
16.5%
Common
ValueCountFrequency (%)
292146
63.2%
/ 170188
36.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4697340
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 628884
13.4%
e 599296
12.8%
i 425470
9.1%
n 348122
 
7.4%
s 344014
 
7.3%
r 340376
 
7.2%
292146
 
6.2%
a 255282
 
5.4%
c 255282
 
5.4%
P 170188
 
3.6%
Other values (16) 1038280
22.1%

PED_ACTION
Categorical

MISSING 

Distinct16
Distinct (%)< 0.1%
Missing5416538
Missing (%)98.3%
Memory size42.0 MiB
Crossing With Signal
34140 
Crossing, No Signal, or Crosswalk
15484 
Crossing, No Signal, Marked Crosswalk
7819 
Other Actions in Roadway
7153 
Crossing Against Signal
6347 
Other values (11)
20270 

Length

Max length47
Median length44
Mean length24.457424
Min length7

Characters and Unicode

Total characters2230835
Distinct characters41
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCrossing With Signal
2nd rowCrossing With Signal
3rd rowCrossing, No Signal, or Crosswalk
4th rowCrossing With Signal
5th rowCrossing With Signal

Common Values

ValueCountFrequency (%)
Crossing With Signal 34140
 
0.6%
Crossing, No Signal, or Crosswalk 15484
 
0.3%
Crossing, No Signal, Marked Crosswalk 7819
 
0.1%
Other Actions in Roadway 7153
 
0.1%
Crossing Against Signal 6347
 
0.1%
Unknown 4356
 
0.1%
Not in Roadway 4313
 
0.1%
Does Not Apply 4058
 
0.1%
Emerging from in Front of/Behind Parked Vehicle 2864
 
0.1%
Working in Roadway 1387
 
< 0.1%
Other values (6) 3292
 
0.1%
(Missing) 5416538
98.3%

Length

2024-10-29T15:08:35.902430image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
crossing 63790
18.9%
signal 63790
18.9%
with 35057
10.4%
crosswalk 23303
 
6.9%
no 23303
 
6.9%
in 16229
 
4.8%
or 15484
 
4.6%
roadway 13365
 
4.0%
other 8409
 
2.5%
not 8371
 
2.5%
Other values (29) 66655
19.7%

Most occurring characters

ValueCountFrequency (%)
246543
11.1%
i 212305
 
9.5%
s 193607
 
8.7%
n 189361
 
8.5%
o 177802
 
8.0%
g 148526
 
6.7%
a 136813
 
6.1%
r 133401
 
6.0%
l 99554
 
4.5%
C 87322
 
3.9%
Other values (31) 605601
27.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1626387
72.9%
Uppercase Letter 305655
 
13.7%
Space Separator 246543
 
11.1%
Other Punctuation 52250
 
2.3%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 212305
13.1%
s 193607
11.9%
n 189361
11.6%
o 177802
10.9%
g 148526
9.1%
a 136813
8.4%
r 133401
8.2%
l 99554
6.1%
t 71166
 
4.4%
h 54486
 
3.4%
Other values (10) 209366
12.9%
Uppercase Letter
ValueCountFrequency (%)
C 87322
28.6%
S 65196
21.3%
W 37893
12.4%
N 31674
 
10.4%
A 19081
 
6.2%
R 14585
 
4.8%
O 10921
 
3.6%
M 7819
 
2.6%
U 4356
 
1.4%
B 4195
 
1.4%
Other values (8) 22613
 
7.4%
Other Punctuation
ValueCountFrequency (%)
, 46606
89.2%
/ 5644
 
10.8%
Space Separator
ValueCountFrequency (%)
246543
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1932042
86.6%
Common 298793
 
13.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 212305
11.0%
s 193607
10.0%
n 189361
 
9.8%
o 177802
 
9.2%
g 148526
 
7.7%
a 136813
 
7.1%
r 133401
 
6.9%
l 99554
 
5.2%
C 87322
 
4.5%
t 71166
 
3.7%
Other values (28) 482185
25.0%
Common
ValueCountFrequency (%)
246543
82.5%
, 46606
 
15.6%
/ 5644
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2230835
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
246543
11.1%
i 212305
 
9.5%
s 193607
 
8.7%
n 189361
 
8.5%
o 177802
 
8.0%
g 148526
 
6.7%
a 136813
 
6.1%
r 133401
 
6.0%
l 99554
 
4.5%
C 87322
 
3.9%
Other values (31) 605601
27.1%

COMPLAINT
Categorical

IMBALANCE  MISSING 

Distinct21
Distinct (%)< 0.1%
Missing2592888
Missing (%)47.1%
Memory size42.0 MiB
Does Not Apply
2431110 
Complaint of Pain or Nausea
 
216456
Complaint of Pain
 
88495
None Visible
 
49508
Minor Bleeding
 
26386
Other values (16)
 
102908

Length

Max length34
Median length14
Mean length14.956543
Min length7

Characters and Unicode

Total characters43596274
Distinct characters39
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDoes Not Apply
2nd rowDoes Not Apply
3rd rowComplaint of Pain or Nausea
4th rowNone Visible
5th rowDoes Not Apply

Common Values

ValueCountFrequency (%)
Does Not Apply 2431110
44.1%
Complaint of Pain or Nausea 216456
 
3.9%
Complaint of Pain 88495
 
1.6%
None Visible 49508
 
0.9%
Minor Bleeding 26386
 
0.5%
Contusion - Bruise 20870
 
0.4%
Unknown 20454
 
0.4%
Whiplash 19846
 
0.4%
Abrasion 15156
 
0.3%
Internal 7809
 
0.1%
Other values (11) 18773
 
0.3%
(Missing) 2592888
47.1%

Length

2024-10-29T15:08:35.972424image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
does 2431110
27.1%
not 2431110
27.1%
apply 2431110
27.1%
complaint 304951
 
3.4%
of 304951
 
3.4%
pain 304951
 
3.4%
or 216456
 
2.4%
nausea 216456
 
2.4%
none 49508
 
0.6%
visible 49508
 
0.6%
Other values (21) 232655
 
2.6%

Most occurring characters

ValueCountFrequency (%)
6057903
13.9%
o 5873572
13.5%
p 5187177
11.9%
e 2863975
 
6.6%
l 2850573
 
6.5%
s 2798861
 
6.4%
t 2794582
 
6.4%
N 2697074
 
6.2%
A 2446426
 
5.6%
D 2445147
 
5.6%
Other values (29) 7580984
17.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 29087012
66.7%
Uppercase Letter 8416452
 
19.3%
Space Separator 6057903
 
13.9%
Dash Punctuation 34907
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 5873572
20.2%
p 5187177
17.8%
e 2863975
9.8%
l 2850573
9.8%
s 2798861
9.6%
t 2794582
9.6%
y 2431142
8.4%
a 1104465
 
3.8%
i 870913
 
3.0%
n 869484
 
3.0%
Other values (13) 1442268
 
5.0%
Uppercase Letter
ValueCountFrequency (%)
N 2697074
32.0%
A 2446426
29.1%
D 2445147
29.1%
C 330678
 
3.9%
P 304983
 
3.6%
B 51781
 
0.6%
V 49508
 
0.6%
M 27838
 
0.3%
U 20454
 
0.2%
W 19846
 
0.2%
Other values (4) 22717
 
0.3%
Space Separator
ValueCountFrequency (%)
6057903
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 34907
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 37503464
86.0%
Common 6092810
 
14.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 5873572
15.7%
p 5187177
13.8%
e 2863975
7.6%
l 2850573
7.6%
s 2798861
7.5%
t 2794582
7.5%
N 2697074
7.2%
A 2446426
6.5%
D 2445147
6.5%
y 2431142
6.5%
Other values (27) 5114935
13.6%
Common
ValueCountFrequency (%)
6057903
99.4%
- 34907
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 43596274
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6057903
13.9%
o 5873572
13.5%
p 5187177
11.9%
e 2863975
 
6.6%
l 2850573
 
6.5%
s 2798861
 
6.4%
t 2794582
 
6.4%
N 2697074
 
6.2%
A 2446426
 
5.6%
D 2445147
 
5.6%
Other values (29) 7580984
17.4%

PED_ROLE
Categorical

MISSING 

Distinct10
Distinct (%)< 0.1%
Missing194889
Missing (%)3.5%
Memory size42.0 MiB
Registrant
2283748 
Driver
2022958 
Passenger
800425 
Pedestrian
 
89617
Witness
 
74794
Other values (5)
 
41320

Length

Max length15
Median length14
Mean length8.2660745
Min length5

Characters and Unicode

Total characters43916513
Distinct characters30
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRegistrant
2nd rowPassenger
3rd rowRegistrant
4th rowNotified Person
5th rowPassenger

Common Values

ValueCountFrequency (%)
Registrant 2283748
41.5%
Driver 2022958
36.7%
Passenger 800425
 
14.5%
Pedestrian 89617
 
1.6%
Witness 74794
 
1.4%
Owner 27910
 
0.5%
Notified Person 8841
 
0.2%
Policy Holder 2415
 
< 0.1%
Other 1776
 
< 0.1%
In-Line Skater 378
 
< 0.1%
(Missing) 194889
 
3.5%

Length

2024-10-29T15:08:36.033967image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-10-29T15:08:36.094220image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
registrant 2283748
42.9%
driver 2022958
38.0%
passenger 800425
 
15.0%
pedestrian 89617
 
1.7%
witness 74794
 
1.4%
owner 27910
 
0.5%
notified 8841
 
0.2%
person 8841
 
0.2%
policy 2415
 
< 0.1%
holder 2415
 
< 0.1%
Other values (3) 2532
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
r 7261026
16.5%
e 6212123
14.1%
t 4742902
10.8%
i 4491592
10.2%
s 4132644
9.4%
n 3286091
7.5%
a 3174168
7.2%
g 3084173
7.0%
R 2283748
 
5.2%
D 2022958
 
4.6%
Other values (20) 3225088
7.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 38579627
87.8%
Uppercase Letter 5324874
 
12.1%
Space Separator 11634
 
< 0.1%
Dash Punctuation 378
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 7261026
18.8%
e 6212123
16.1%
t 4742902
12.3%
i 4491592
11.6%
s 4132644
10.7%
n 3286091
8.5%
a 3174168
8.2%
g 3084173
8.0%
v 2022958
 
5.2%
d 100873
 
0.3%
Other values (8) 71077
 
0.2%
Uppercase Letter
ValueCountFrequency (%)
R 2283748
42.9%
D 2022958
38.0%
P 901298
 
16.9%
W 74794
 
1.4%
O 29686
 
0.6%
N 8841
 
0.2%
H 2415
 
< 0.1%
I 378
 
< 0.1%
L 378
 
< 0.1%
S 378
 
< 0.1%
Space Separator
ValueCountFrequency (%)
11634
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 378
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 43904501
> 99.9%
Common 12012
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 7261026
16.5%
e 6212123
14.1%
t 4742902
10.8%
i 4491592
10.2%
s 4132644
9.4%
n 3286091
7.5%
a 3174168
7.2%
g 3084173
7.0%
R 2283748
 
5.2%
D 2022958
 
4.6%
Other values (18) 3213076
7.3%
Common
ValueCountFrequency (%)
11634
96.9%
- 378
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 43916513
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 7261026
16.5%
e 6212123
14.1%
t 4742902
10.8%
i 4491592
10.2%
s 4132644
9.4%
n 3286091
7.5%
a 3174168
7.2%
g 3084173
7.0%
R 2283748
 
5.2%
D 2022958
 
4.6%
Other values (20) 3225088
7.3%

CONTRIBUTING_FACTOR_1
Text

MISSING 

Distinct53
Distinct (%)0.1%
Missing5417789
Missing (%)98.4%
Memory size42.0 MiB
2024-10-29T15:08:36.189597image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length53
Median length11
Mean length19.59404
Min length5

Characters and Unicode

Total characters1762719
Distinct characters52
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified
ValueCountFrequency (%)
unspecified 62863
45.2%
error/confusion 14199
 
10.2%
pedestrian/bicyclist/other 14199
 
10.2%
pedestrian 14199
 
10.2%
driver 3403
 
2.4%
inattention/distraction 3317
 
2.4%
to 2433
 
1.7%
failure 2371
 
1.7%
yield 2347
 
1.7%
right-of-way 2347
 
1.7%
Other values (90) 17486
 
12.6%
2024-10-29T15:08:36.356430image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 226607
12.9%
e 223701
12.7%
n 141140
 
8.0%
s 128644
 
7.3%
r 109046
 
6.2%
c 100279
 
5.7%
d 99107
 
5.6%
t 85246
 
4.8%
f 82933
 
4.7%
p 64088
 
3.6%
Other values (42) 501928
28.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1475172
83.7%
Uppercase Letter 185649
 
10.5%
Space Separator 49202
 
2.8%
Other Punctuation 46852
 
2.7%
Dash Punctuation 5058
 
0.3%
Close Punctuation 393
 
< 0.1%
Open Punctuation 393
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 226607
15.4%
e 223701
15.2%
n 141140
9.6%
s 128644
8.7%
r 109046
7.4%
c 100279
6.8%
d 99107
6.7%
t 85246
 
5.8%
f 82933
 
5.6%
p 64088
 
4.3%
Other values (15) 214381
14.5%
Uppercase Letter
ValueCountFrequency (%)
U 63662
34.3%
P 29546
15.9%
C 16471
 
8.9%
O 16121
 
8.7%
B 14463
 
7.8%
E 14229
 
7.7%
D 8960
 
4.8%
I 5081
 
2.7%
R 2795
 
1.5%
F 2520
 
1.4%
Other values (12) 11801
 
6.4%
Space Separator
ValueCountFrequency (%)
49202
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 46852
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 5058
100.0%
Close Punctuation
ValueCountFrequency (%)
) 393
100.0%
Open Punctuation
ValueCountFrequency (%)
( 393
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1660821
94.2%
Common 101898
 
5.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 226607
13.6%
e 223701
13.5%
n 141140
 
8.5%
s 128644
 
7.7%
r 109046
 
6.6%
c 100279
 
6.0%
d 99107
 
6.0%
t 85246
 
5.1%
f 82933
 
5.0%
p 64088
 
3.9%
Other values (37) 400030
24.1%
Common
ValueCountFrequency (%)
49202
48.3%
/ 46852
46.0%
- 5058
 
5.0%
) 393
 
0.4%
( 393
 
0.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1762719
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 226607
12.9%
e 223701
12.7%
n 141140
 
8.0%
s 128644
 
7.3%
r 109046
 
6.2%
c 100279
 
5.7%
d 99107
 
5.6%
t 85246
 
4.8%
f 82933
 
4.7%
p 64088
 
3.6%
Other values (42) 501928
28.5%

CONTRIBUTING_FACTOR_2
Text

MISSING 

Distinct51
Distinct (%)0.1%
Missing5417906
Missing (%)98.4%
Memory size42.0 MiB
2024-10-29T15:08:36.468204image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Length

Max length53
Median length11
Mean length13.861194
Min length5

Characters and Unicode

Total characters1245359
Distinct characters52
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5 ?
Unique (%)< 0.1%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified
ValueCountFrequency (%)
unspecified 79006
72.3%
pedestrian/bicyclist/other 3940
 
3.6%
pedestrian 3940
 
3.6%
error/confusion 3940
 
3.6%
driver 1447
 
1.3%
inattention/distraction 1314
 
1.2%
to 1291
 
1.2%
failure 1258
 
1.2%
yield 1235
 
1.1%
right-of-way 1235
 
1.1%
Other values (84) 10646
 
9.7%
2024-10-29T15:08:36.645549image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 193095
15.5%
e 192459
15.5%
n 104694
8.4%
s 99760
8.0%
d 91909
7.4%
c 91446
7.3%
f 86663
7.0%
p 80019
6.4%
U 79513
6.4%
r 36741
 
3.0%
Other values (42) 189060
15.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1086410
87.2%
Uppercase Letter 122747
 
9.9%
Space Separator 19407
 
1.6%
Other Punctuation 13745
 
1.1%
Dash Punctuation 2650
 
0.2%
Open Punctuation 200
 
< 0.1%
Close Punctuation 200
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 193095
17.8%
e 192459
17.7%
n 104694
9.6%
s 99760
9.2%
d 91909
8.5%
c 91446
8.4%
f 86663
8.0%
p 80019
7.4%
r 36741
 
3.4%
t 29131
 
2.7%
Other values (15) 80493
7.4%
Uppercase Letter
ValueCountFrequency (%)
U 79513
64.8%
P 8560
 
7.0%
C 5539
 
4.5%
O 5099
 
4.2%
D 4252
 
3.5%
B 4044
 
3.3%
E 3966
 
3.2%
I 2178
 
1.8%
T 1429
 
1.2%
R 1423
 
1.2%
Other values (12) 6744
 
5.5%
Space Separator
ValueCountFrequency (%)
19407
100.0%
Other Punctuation
ValueCountFrequency (%)
/ 13745
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 2650
100.0%
Open Punctuation
ValueCountFrequency (%)
( 200
100.0%
Close Punctuation
ValueCountFrequency (%)
) 200
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1209157
97.1%
Common 36202
 
2.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 193095
16.0%
e 192459
15.9%
n 104694
8.7%
s 99760
8.3%
d 91909
7.6%
c 91446
7.6%
f 86663
7.2%
p 80019
6.6%
U 79513
6.6%
r 36741
 
3.0%
Other values (37) 152858
12.6%
Common
ValueCountFrequency (%)
19407
53.6%
/ 13745
38.0%
- 2650
 
7.3%
( 200
 
0.6%
) 200
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1245359
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 193095
15.5%
e 192459
15.5%
n 104694
8.4%
s 99760
8.0%
d 91909
7.4%
c 91446
7.3%
f 86663
7.0%
p 80019
6.4%
U 79513
6.4%
r 36741
 
3.0%
Other values (42) 189060
15.2%

PERSON_SEX
Categorical

MISSING 

Distinct3
Distinct (%)< 0.1%
Missing614713
Missing (%)11.2%
Memory size42.0 MiB
M
2969786 
F
1490172 
U
433080 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4893038
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowU
2nd rowF
3rd rowM
4th rowF
5th rowM

Common Values

ValueCountFrequency (%)
M 2969786
53.9%
F 1490172
27.1%
U 433080
 
7.9%
(Missing) 614713
 
11.2%

Length

2024-10-29T15:08:36.723889image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-10-29T15:08:36.773479image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
m 2969786
60.7%
f 1490172
30.5%
u 433080
 
8.9%

Most occurring characters

ValueCountFrequency (%)
M 2969786
60.7%
F 1490172
30.5%
U 433080
 
8.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 4893038
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M 2969786
60.7%
F 1490172
30.5%
U 433080
 
8.9%

Most occurring scripts

ValueCountFrequency (%)
Latin 4893038
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
M 2969786
60.7%
F 1490172
30.5%
U 433080
 
8.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4893038
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M 2969786
60.7%
F 1490172
30.5%
U 433080
 
8.9%

Interactions

2024-10-29T15:07:55.361491image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:07:50.182454image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:07:51.932730image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:07:53.669022image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:07:55.797988image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:07:50.624583image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:07:52.317583image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:07:54.129657image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:07:56.195821image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:07:51.083280image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:07:52.769172image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:07:54.531456image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:07:56.545850image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:07:51.510440image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:07:53.201964image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2024-10-29T15:07:54.924828image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Missing values

2024-10-29T15:07:58.543187image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2024-10-29T15:08:05.136972image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2024-10-29T15:08:23.452898image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

UNIQUE_IDCOLLISION_IDCRASH_DATECRASH_TIMEPERSON_IDPERSON_TYPEPERSON_INJURYVEHICLE_IDPERSON_AGEEJECTIONEMOTIONAL_STATUSBODILY_INJURYPOSITION_IN_VEHICLESAFETY_EQUIPMENTPED_LOCATIONPED_ACTIONCOMPLAINTPED_ROLECONTRIBUTING_FACTOR_1CONTRIBUTING_FACTOR_2PERSON_SEX
010249006422955410/26/20199:4331aa2bc0-f545-444f-8cdb-f1cb5cf00b89OccupantUnspecified19141108.0NaNNaNNaNNaNNaNNaNNaNNaNNaNRegistrantNaNNaNU
110255054423058710/25/201915:154629e500-a73e-48dc-b8fb-53124d124b80OccupantUnspecified19144075.033.0Not EjectedDoes Not ApplyDoes Not ApplyFront passenger, if two or more persons, including the driver, are in the front seatLap Belt & HarnessNaNNaNDoes Not ApplyPassengerNaNNaNF
210253177423055010/26/201917:55ae48c136-1383-45db-83f4-2a5eecfb7cffOccupantUnspecified19143133.055.0NaNNaNNaNNaNNaNNaNNaNNaNRegistrantNaNNaNM
36650180356552711/21/201613:052782525OccupantUnspecifiedNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNotified PersonNaNNaNNaN
410255516423116810/25/201911:16e038e18f-40fb-4471-99cf-345eae36e064OccupantUnspecified19144329.07.0Not EjectedDoes Not ApplyDoes Not ApplyRight rear passenger or motorcycle sidecar passengerLap BeltNaNNaNDoes Not ApplyPassengerNaNNaNF
510253606423074310/24/201919:1584bcb3a7-d201-4c61-9e30-fe29268c1074OccupantInjured19143343.027.0Not EjectedConsciousBackDriverLap Belt & HarnessNaNNaNComplaint of Pain or NauseaDriverNaNNaNM
610251336423004710/26/201916:4521064a07-a945-49d0-af97-5446801b20ceOccupantUnspecified19142198.041.0NaNNaNNaNNaNNaNNaNNaNNaNRegistrantNaNNaNF
710248708422954710/26/20191:15a8904763-2870-42f3-865c-b53d8e5156e2PedestrianInjuredNaN24.0NaNConsciousShoulder - Upper ArmNaNNaNPedestrian/Bicyclist/Other Pedestrian at IntersectionCrossing With SignalNone VisiblePedestrianUnspecifiedUnspecifiedF
810250179422980810/26/201913:04c3fc715e-203f-462d-9e8b-6a41fc378703OccupantUnspecified19141630.036.0Not EjectedDoes Not ApplyDoes Not ApplyDriverLap Belt & HarnessNaNNaNDoes Not ApplyDriverNaNNaNM
910253792423091510/24/20198:20793ac6c6-cbc7-4ab3-ab95-09f9312f1123OccupantUnspecified19143438.0NaNNaNNaNNaNNaNNaNNaNNaNNaNRegistrantNaNNaNU
UNIQUE_IDCOLLISION_IDCRASH_DATECRASH_TIMEPERSON_IDPERSON_TYPEPERSON_INJURYVEHICLE_IDPERSON_AGEEJECTIONEMOTIONAL_STATUSBODILY_INJURYPOSITION_IN_VEHICLESAFETY_EQUIPMENTPED_LOCATIONPED_ACTIONCOMPLAINTPED_ROLECONTRIBUTING_FACTOR_1CONTRIBUTING_FACTOR_2PERSON_SEX
550774113186862476601110/22/20247:00c18830ff-d5e2-4e7b-8e17-f3a46f0026bbOccupantUnspecified20770941.074.0NaNNaNNaNNaNNaNNaNNaNNaNRegistrantNaNNaNM
550774213186976476594310/16/202418:1040c9066d-3fda-4736-8030-0cb6f64a5350OccupantInjured20771004.029.0Not EjectedDoes Not ApplyAbdomen - PelvisDriverUnknownNaNNaNComplaint of Pain or NauseaDriverNaNNaNF
550774313183711476557310/22/202417:48c6f97ee6-7fac-4d10-90e2-2b9deb753c53OccupantUnspecified20769183.045.0Not EjectedDoes Not ApplyDoes Not ApplyFront passenger, if two or more persons, including the driver, are in the front seatLap BeltNaNNaNDoes Not ApplyPassengerNaNNaNM
550774413184537476546610/22/20247:560676abdb-0804-4191-980f-e90b2faf293cOccupantUnspecified20769639.032.0Not EjectedDoes Not ApplyDoes Not ApplyDriverLap Belt & HarnessNaNNaNDoes Not ApplyDriverNaNNaNF
550774513187048476600910/21/202413:00b57d9cb0-783a-4fd3-82b2-266c1cdce5a1OccupantInjured20771049.068.0Not EjectedConsciousChestMiddle front seat, or passenger lying across a seatLap BeltNaNNaNComplaint of Pain or NauseaPassengerNaNNaNF
550774613186556476610710/05/20241:22bc772d91-50be-49e2-9447-2738b84bd314OccupantUnspecified20770750.048.0Not EjectedDoes Not ApplyDoes Not ApplyDriverNaNNaNNaNDoes Not ApplyDriverNaNNaNF
550774713186411476603710/21/202415:1667748140-9d98-4ea8-b2b1-fb29a5a6e1e8OccupantUnspecified20770672.058.0NaNNaNNaNNaNNaNNaNNaNNaNRegistrantNaNNaNM
550774813185189476571610/22/202420:138e79e3e6-d28b-44a3-aeb0-7336a93f09feOccupantUnspecified20770000.0NaNNaNNaNNaNNaNNaNNaNNaNNaNRegistrantNaNNaNNaN
550774913186716476590410/21/202418:115236c0d4-92f0-4de0-b27d-07094ac6787bOccupantUnspecified20770846.00.0Not EjectedDoes Not ApplyDoes Not ApplyDriverUnknownNaNNaNDoes Not ApplyPassengerNaNNaNU
550775013186243476599610/20/202423:42e35a613f-0114-4882-a6e0-624fb2d663dcOccupantUnspecified20770587.0NaNNaNNaNNaNNaNNaNNaNNaNNaNRegistrantNaNNaNNaN